Dataset statistics
| Number of variables | 16 |
|---|---|
| Number of observations | 11684068 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 5.0 GiB |
| Average record size in memory | 459.2 B |
Variable types
| Categorical | 6 |
|---|---|
| Numeric | 9 |
| DateTime | 1 |
InventoryId has a high cardinality: 203508 distinct values | High cardinality |
Description has a high cardinality: 7215 distinct values | High cardinality |
VendorName has a high cardinality: 118 distinct values | High cardinality |
City has a high cardinality: 67 distinct values | High cardinality |
Brand is highly correlated with Size and 1 other fields | High correlation |
SalesQuantity is highly correlated with SalesDollars and 1 other fields | High correlation |
SalesDollars is highly correlated with SalesQuantity and 1 other fields | High correlation |
SalesPrice is highly correlated with SalesDollars | High correlation |
Volume is highly correlated with Size | High correlation |
Classification is highly correlated with Brand and 1 other fields | High correlation |
ExciseTax is highly correlated with SalesQuantity and 1 other fields | High correlation |
Size is highly correlated with Brand and 2 other fields | High correlation |
Store is highly correlated with City | High correlation |
City is highly correlated with Store | High correlation |
SalesQuantity is highly skewed (γ1 = 26.01507302) | Skewed |
SalesDollars is highly skewed (γ1 = 39.84989273) | Skewed |
SalesPrice is highly skewed (γ1 = 42.4717086) | Skewed |
ExciseTax is highly skewed (γ1 = 29.34954166) | Skewed |
Reproduction
| Analysis started | 2022-10-05 01:56:04.206800 |
|---|---|
| Analysis finished | 2022-10-05 02:10:29.626483 |
| Duration | 14 minutes and 25.42 seconds |
| Software version | pandas-profiling v3.3.0 |
| Download configuration | config.json |
| Distinct | 203508 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 913.6 MiB |
| 68_SOLARIS_5270 | 344 |
|---|---|
| 13_TARMSWORTH_8064 | 341 |
| 35_HALIVAARA_4157 | 339 |
| 35_HALIVAARA_4135 | 339 |
| 59_CLAETHORPES_3606 | 336 |
| Other values (203503) |
Length
| Max length | 22 |
|---|---|
| Median length | 20 |
| Mean length | 16.99202743 |
| Min length | 10 |
Characters and Unicode
| Total characters | 198536004 |
|---|---|
| Distinct characters | 35 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 5356 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 1_HARDERSFIELD_1004 |
|---|---|
| 2nd row | 1_HARDERSFIELD_1004 |
| 3rd row | 1_HARDERSFIELD_1004 |
| 4th row | 1_HARDERSFIELD_1004 |
| 5th row | 1_HARDERSFIELD_1004 |
Common Values
| Value | Count | Frequency (%) |
| 68_SOLARIS_5270 | 344 | < 0.1% |
| 13_TARMSWORTH_8064 | 341 | < 0.1% |
| 35_HALIVAARA_4157 | 339 | < 0.1% |
| 35_HALIVAARA_4135 | 339 | < 0.1% |
| 59_CLAETHORPES_3606 | 336 | < 0.1% |
| 39_EASTHALLOW_8111 | 336 | < 0.1% |
| 49_GARIGILL_8184 | 334 | < 0.1% |
| 18_FURNESS_1892 | 333 | < 0.1% |
| 13_TARMSWORTH_3837 | 332 | < 0.1% |
| 13_TARMSWORTH_3606 | 330 | < 0.1% |
| Other values (203498) | 11680704 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 55_dry | 229503 | 1.9% |
| 56_beggar's | 143420 | 1.2% |
| 42_black | 94665 | 0.8% |
| 47_pella's | 87608 | 0.7% |
| 26_knife's | 25453 | 0.2% |
| 68_solaris_5270 | 344 | < 0.1% |
| 13_tarmsworth_8064 | 341 | < 0.1% |
| 35_halivaara_4157 | 339 | < 0.1% |
| 35_halivaara_4135 | 339 | < 0.1% |
| 39_easthallow_8111 | 336 | < 0.1% |
| Other values (203503) | 11682369 |
Most occurring characters
| Value | Count | Frequency (%) |
| _ | 23368136 | 11.8% |
| E | 12492906 | 6.3% |
| R | 10407731 | 5.2% |
| 3 | 9859820 | 5.0% |
| N | 9303426 | 4.7% |
| A | 8837376 | 4.5% |
| 1 | 8422395 | 4.2% |
| 2 | 8217364 | 4.1% |
| 6 | 7963798 | 4.0% |
| 4 | 7956470 | 4.0% |
| Other values (25) | 91706582 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 101943978 | |
| Decimal Number | 72386760 | |
| Connector Punctuation | 23368136 | 11.8% |
| Space Separator | 580649 | 0.3% |
| Other Punctuation | 256481 | 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| E | 12492906 | |
| R | 10407731 | 10.2% |
| N | 9303426 | 9.1% |
| A | 8837376 | 8.7% |
| O | 6906316 | 6.8% |
| L | 6549263 | 6.4% |
| S | 6497537 | 6.4% |
| T | 5722509 | 5.6% |
| D | 4414554 | 4.3% |
| C | 4182380 | 4.1% |
| Other values (12) | 26629980 |
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 9859820 | |
| 1 | 8422395 | |
| 2 | 8217364 | |
| 6 | 7963798 | |
| 4 | 7956470 | |
| 5 | 7388284 | |
| 7 | 6916425 | |
| 8 | 6446572 | |
| 9 | 4888778 | |
| 0 | 4326854 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 23368136 |
Space Separator
| Value | Count | Frequency (%) |
| 580649 |
Other Punctuation
| Value | Count | Frequency (%) |
| ' | 256481 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 101943978 | |
| Common | 96592026 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| E | 12492906 | |
| R | 10407731 | 10.2% |
| N | 9303426 | 9.1% |
| A | 8837376 | 8.7% |
| O | 6906316 | 6.8% |
| L | 6549263 | 6.4% |
| S | 6497537 | 6.4% |
| T | 5722509 | 5.6% |
| D | 4414554 | 4.3% |
| C | 4182380 | 4.1% |
| Other values (12) | 26629980 |
Common
| Value | Count | Frequency (%) |
| _ | 23368136 | |
| 3 | 9859820 | |
| 1 | 8422395 | 8.7% |
| 2 | 8217364 | 8.5% |
| 6 | 7963798 | 8.2% |
| 4 | 7956470 | 8.2% |
| 5 | 7388284 | 7.6% |
| 7 | 6916425 | 7.2% |
| 8 | 6446572 | 6.7% |
| 9 | 4888778 | 5.1% |
| Other values (3) | 5163984 | 5.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 198536004 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| _ | 23368136 | 11.8% |
| E | 12492906 | 6.3% |
| R | 10407731 | 5.2% |
| 3 | 9859820 | 5.0% |
| N | 9303426 | 4.7% |
| A | 8837376 | 4.5% |
| 1 | 8422395 | 4.2% |
| 2 | 8217364 | 4.1% |
| 6 | 7963798 | 4.0% |
| 4 | 7956470 | 4.0% |
| Other values (25) | 91706582 |
| Distinct | 79 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 43.58414561 |
| Minimum | 1 |
|---|---|
| Maximum | 79 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 178.3 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 6 |
| Q1 | 23 |
| median | 46 |
| Q3 | 66 |
| 95-th percentile | 76 |
| Maximum | 79 |
| Range | 78 |
| Interquartile range (IQR) | 43 |
Descriptive statistics
| Standard deviation | 23.54349106 |
|---|---|
| Coefficient of variation (CV) | 0.540184756 |
| Kurtosis | -1.251263548 |
| Mean | 43.58414561 |
| Median Absolute Deviation (MAD) | 21 |
| Skewness | -0.2055342007 |
| Sum | 509240121 |
| Variance | 554.2959713 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 76 | 439940 | 3.8% |
| 73 | 406820 | 3.5% |
| 38 | 381558 | 3.3% |
| 34 | 381510 | 3.3% |
| 66 | 375164 | 3.2% |
| 67 | 344822 | 3.0% |
| 69 | 315666 | 2.7% |
| 50 | 287218 | 2.5% |
| 60 | 267212 | 2.3% |
| 15 | 244581 | 2.1% |
| Other values (69) | 8239577 |
| Value | Count | Frequency (%) |
| 1 | 205824 | |
| 2 | 157945 | |
| 3 | 17988 | 0.2% |
| 4 | 113137 | |
| 5 | 60499 | 0.5% |
| 6 | 173943 | |
| 7 | 162033 | |
| 8 | 121033 | |
| 9 | 171195 | |
| 10 | 207668 |
| Value | Count | Frequency (%) |
| 79 | 166902 | 1.4% |
| 78 | 92875 | 0.8% |
| 77 | 117887 | 1.0% |
| 76 | 439940 | |
| 75 | 127344 | 1.1% |
| 74 | 173253 | 1.5% |
| 73 | 406820 | |
| 72 | 154523 | 1.3% |
| 71 | 157970 | 1.4% |
| 70 | 77512 | 0.7% |
| Distinct | 8005 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11971.51603 |
| Minimum | 58 |
|---|---|
| Maximum | 90090 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 178.3 MiB |
Quantile statistics
| Minimum | 58 |
|---|---|
| 5-th percentile | 1376 |
| Q1 | 3679 |
| median | 6216 |
| Q3 | 16656 |
| 95-th percentile | 40337 |
| Maximum | 90090 |
| Range | 90032 |
| Interquartile range (IQR) | 12977 |
Descriptive statistics
| Standard deviation | 12398.43042 |
|---|---|
| Coefficient of variation (CV) | 1.035660846 |
| Kurtosis | 0.6925634103 |
| Mean | 11971.51603 |
| Median Absolute Deviation (MAD) | 2972 |
| Skewness | 1.38949857 |
| Sum | 1.398760073 × 1011 |
| Variance | 153721076.8 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3606 | 21899 | 0.2% |
| 8111 | 21556 | 0.2% |
| 5111 | 20952 | 0.2% |
| 1892 | 20745 | 0.2% |
| 4157 | 19175 | 0.2% |
| 4135 | 19095 | 0.2% |
| 3896 | 18760 | 0.2% |
| 8068 | 18755 | 0.2% |
| 2120 | 18608 | 0.2% |
| 3837 | 18534 | 0.2% |
| Other values (7995) | 11485989 |
| Value | Count | Frequency (%) |
| 58 | 2146 | < 0.1% |
| 60 | 821 | < 0.1% |
| 61 | 16 | < 0.1% |
| 62 | 2344 | < 0.1% |
| 63 | 2174 | < 0.1% |
| 72 | 336 | < 0.1% |
| 75 | 8 | < 0.1% |
| 77 | 6197 | |
| 79 | 4139 | |
| 82 | 15 | < 0.1% |
| Value | Count | Frequency (%) |
| 90090 | 6 | < 0.1% |
| 90089 | 42 | |
| 90088 | 18 | |
| 90087 | 14 | < 0.1% |
| 90086 | 24 | |
| 90085 | 6 | < 0.1% |
| 90084 | 4 | < 0.1% |
| 90082 | 16 | < 0.1% |
| 90081 | 11 | < 0.1% |
| 90080 | 6 | < 0.1% |
| Distinct | 7215 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 958.9 MiB |
| Jack Daniels No 7 Black | 77657 |
|---|---|
| Jagermeister Liqueur | 74087 |
| Capt Morgan Spiced Rum | 73056 |
| Tito's Handmade Vodka | 70683 |
| Smirnoff 80 Proof | 70310 |
| Other values (7210) |
Length
| Max length | 28 |
|---|---|
| Median length | 22 |
| Mean length | 21.05547648 |
| Min length | 6 |
Characters and Unicode
| Total characters | 246013619 |
|---|---|
| Distinct characters | 75 |
| Distinct categories | 9 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 290 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | Jim Beam w/2 Rocks Glasses |
|---|---|
| 2nd row | Jim Beam w/2 Rocks Glasses |
| 3rd row | Jim Beam w/2 Rocks Glasses |
| 4th row | Jim Beam w/2 Rocks Glasses |
| 5th row | Jim Beam w/2 Rocks Glasses |
Common Values
| Value | Count | Frequency (%) |
| Jack Daniels No 7 Black | 77657 | 0.7% |
| Jagermeister Liqueur | 74087 | 0.6% |
| Capt Morgan Spiced Rum | 73056 | 0.6% |
| Tito's Handmade Vodka | 70683 | 0.6% |
| Smirnoff 80 Proof | 70310 | 0.6% |
| Bacardi Superior Rum | 70010 | 0.6% |
| Kahlua | 69937 | 0.6% |
| Jim Beam | 68003 | 0.6% |
| Absolut 80 Proof | 67528 | 0.6% |
| Jose Cuervo Especial | 61128 | 0.5% |
| Other values (7205) | 10981669 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| vodka | 1277087 | 3.1% |
| svgn | 871713 | 2.1% |
| chard | 664559 | 1.6% |
| rum | 647514 | 1.6% |
| cab | 641638 | 1.6% |
| pnt | 630297 | 1.5% |
| cal | 399372 | 1.0% |
| black | 386037 | 0.9% |
| smirnoff | 379514 | 0.9% |
| red | 374024 | 0.9% |
| Other values (6640) | 34788489 |
Most occurring characters
| Value | Count | Frequency (%) |
| 29377457 | 11.9% | |
| a | 20177049 | 8.2% |
| e | 18241330 | 7.4% |
| r | 15332560 | 6.2% |
| o | 15043223 | 6.1% |
| n | 13110019 | 5.3% |
| i | 12644026 | 5.1% |
| l | 11560673 | 4.7% |
| t | 8011308 | 3.3% |
| s | 7450879 | 3.0% |
| Other values (65) | 95065095 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 172494614 | |
| Uppercase Letter | 41272374 | 16.8% |
| Space Separator | 29377457 | 11.9% |
| Decimal Number | 1999725 | 0.8% |
| Other Punctuation | 746980 | 0.3% |
| Dash Punctuation | 99634 | < 0.1% |
| Math Symbol | 21787 | < 0.1% |
| Open Punctuation | 524 | < 0.1% |
| Close Punctuation | 524 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 20177049 | |
| e | 18241330 | |
| r | 15332560 | 8.9% |
| o | 15043223 | 8.7% |
| n | 13110019 | 7.6% |
| i | 12644026 | 7.3% |
| l | 11560673 | 6.7% |
| t | 8011308 | 4.6% |
| s | 7450879 | 4.3% |
| d | 7013627 | 4.1% |
| Other values (16) | 43909920 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 5991533 | |
| S | 4780870 | |
| B | 3871089 | 9.4% |
| R | 3226514 | 7.8% |
| M | 2821845 | 6.8% |
| V | 2535173 | 6.1% |
| P | 2342603 | 5.7% |
| G | 1849123 | 4.5% |
| T | 1820126 | 4.4% |
| D | 1427876 | 3.5% |
| Other values (16) | 10605622 |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 508467 | |
| 1 | 463174 | |
| 8 | 301000 | |
| 7 | 174757 | 8.7% |
| 2 | 147531 | 7.4% |
| 5 | 133339 | 6.7% |
| 4 | 90702 | 4.5% |
| 9 | 84469 | 4.2% |
| 3 | 62441 | 3.1% |
| 6 | 33845 | 1.7% |
Other Punctuation
| Value | Count | Frequency (%) |
| ' | 387986 | |
| & | 195035 | |
| / | 148838 | 19.9% |
| # | 5508 | 0.7% |
| * | 4233 | 0.6% |
| . | 3588 | 0.5% |
| ! | 1750 | 0.2% |
| % | 42 | < 0.1% |
Space Separator
| Value | Count | Frequency (%) |
| 29377457 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 99634 |
Math Symbol
| Value | Count | Frequency (%) |
| + | 21787 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 524 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 524 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 213766988 | |
| Common | 32246631 | 13.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 20177049 | 9.4% |
| e | 18241330 | 8.5% |
| r | 15332560 | 7.2% |
| o | 15043223 | 7.0% |
| n | 13110019 | 6.1% |
| i | 12644026 | 5.9% |
| l | 11560673 | 5.4% |
| t | 8011308 | 3.7% |
| s | 7450879 | 3.5% |
| d | 7013627 | 3.3% |
| Other values (42) | 85182294 |
Common
| Value | Count | Frequency (%) |
| 29377457 | ||
| 0 | 508467 | 1.6% |
| 1 | 463174 | 1.4% |
| ' | 387986 | 1.2% |
| 8 | 301000 | 0.9% |
| & | 195035 | 0.6% |
| 7 | 174757 | 0.5% |
| / | 148838 | 0.5% |
| 2 | 147531 | 0.5% |
| 5 | 133339 | 0.4% |
| Other values (13) | 409047 | 1.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 246013619 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 29377457 | 11.9% | |
| a | 20177049 | 8.2% |
| e | 18241330 | 7.4% |
| r | 15332560 | 6.2% |
| o | 15043223 | 6.1% |
| n | 13110019 | 5.3% |
| i | 12644026 | 5.1% |
| l | 11560673 | 4.7% |
| t | 8011308 | 3.3% |
| s | 7450879 | 3.0% |
| Other values (65) | 95065095 |
| Distinct | 40 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 777.8 MiB |
| 750mL | |
|---|---|
| 1.75L | |
| 50mL | |
| 1.5L | |
| 375mL | 595318 |
| Other values (35) |
Length
| Max length | 10 |
|---|---|
| Median length | 5 |
| Mean length | 4.804751821 |
| Min length | 2 |
Characters and Unicode
| Total characters | 56139047 |
|---|---|
| Distinct characters | 22 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 750mL |
|---|---|
| 2nd row | 750mL |
| 3rd row | 750mL |
| 4th row | 750mL |
| 5th row | 750mL |
Common Values
| Value | Count | Frequency (%) |
| 750mL | 6661610 | |
| 1.75L | 1976333 | 16.9% |
| 50mL | 1031306 | 8.8% |
| 1.5L | 734451 | 6.3% |
| 375mL | 595318 | 5.1% |
| Liter | 207512 | 1.8% |
| 3L | 153229 | 1.3% |
| 5L | 130108 | 1.1% |
| 187mL 4 Pk | 37100 | 0.3% |
| 500mL | 27898 | 0.2% |
| Other values (30) | 129203 | 1.1% |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 750ml | 6663894 | |
| 1.75l | 1976333 | 16.7% |
| 50ml | 1048105 | 8.8% |
| 1.5l | 734451 | 6.2% |
| 375ml | 595987 | 5.0% |
| liter | 207512 | 1.8% |
| 3l | 153229 | 1.3% |
| 5l | 130108 | 1.1% |
| pk | 78363 | 0.7% |
| 4 | 63483 | 0.5% |
| Other values (21) | 193194 | 1.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| L | 11681553 | |
| 5 | 11195882 | |
| 7 | 9292961 | |
| m | 8463241 | |
| 0 | 7901892 | |
| 1 | 2797613 | 5.0% |
| . | 2713299 | 4.8% |
| 3 | 764206 | 1.4% |
| e | 207512 | 0.4% |
| r | 207512 | 0.4% |
| Other values (12) | 913376 | 1.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 32125476 | |
| Uppercase Letter | 11762431 | 21.0% |
| Lowercase Letter | 9374167 | 16.7% |
| Other Punctuation | 2715707 | 4.8% |
| Space Separator | 160591 | 0.3% |
| Math Symbol | 675 | < 0.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 5 | 11195882 | |
| 7 | 9292961 | |
| 0 | 7901892 | |
| 1 | 2797613 | 8.7% |
| 3 | 764206 | 2.4% |
| 4 | 80062 | 0.2% |
| 8 | 55657 | 0.2% |
| 2 | 37203 | 0.1% |
Lowercase Letter
| Value | Count | Frequency (%) |
| m | 8463241 | |
| e | 207512 | 2.2% |
| r | 207512 | 2.2% |
| t | 207512 | 2.2% |
| i | 207512 | 2.2% |
| k | 78363 | 0.8% |
| z | 2515 | < 0.1% |
Uppercase Letter
| Value | Count | Frequency (%) |
| L | 11681553 | |
| P | 78363 | 0.7% |
| O | 2515 | < 0.1% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 2713299 | |
| / | 2408 | 0.1% |
Space Separator
| Value | Count | Frequency (%) |
| 160591 |
Math Symbol
| Value | Count | Frequency (%) |
| + | 675 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 35002449 | |
| Latin | 21136598 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 5 | 11195882 | |
| 7 | 9292961 | |
| 0 | 7901892 | |
| 1 | 2797613 | 8.0% |
| . | 2713299 | 7.8% |
| 3 | 764206 | 2.2% |
| 160591 | 0.5% | |
| 4 | 80062 | 0.2% |
| 8 | 55657 | 0.2% |
| 2 | 37203 | 0.1% |
| Other values (2) | 3083 | < 0.1% |
Latin
| Value | Count | Frequency (%) |
| L | 11681553 | |
| m | 8463241 | |
| e | 207512 | 1.0% |
| r | 207512 | 1.0% |
| t | 207512 | 1.0% |
| i | 207512 | 1.0% |
| P | 78363 | 0.4% |
| k | 78363 | 0.4% |
| O | 2515 | < 0.1% |
| z | 2515 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 56139047 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| L | 11681553 | |
| 5 | 11195882 | |
| 7 | 9292961 | |
| m | 8463241 | |
| 0 | 7901892 | |
| 1 | 2797613 | 5.0% |
| . | 2713299 | 4.8% |
| 3 | 764206 | 1.4% |
| e | 207512 | 0.4% |
| r | 207512 | 0.4% |
| Other values (12) | 913376 | 1.6% |
| Distinct | 351 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.558967048 |
| Minimum | 1 |
|---|---|
| Maximum | 1231 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 178.3 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 8 |
| Maximum | 1231 |
| Range | 1230 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 4.494622283 |
|---|---|
| Coefficient of variation (CV) | 1.756420539 |
| Kurtosis | 2770.417282 |
| Mean | 2.558967048 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 26.01507302 |
| Sum | 29899145 |
| Variance | 20.20162946 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 6395518 | |
| 2 | 2390305 | 20.5% |
| 3 | 990205 | 8.5% |
| 4 | 573608 | 4.9% |
| 5 | 305510 | 2.6% |
| 6 | 262681 | 2.2% |
| 7 | 143383 | 1.2% |
| 8 | 106698 | 0.9% |
| 12 | 82691 | 0.7% |
| 9 | 73134 | 0.6% |
| Other values (341) | 360335 | 3.1% |
| Value | Count | Frequency (%) |
| 1 | 6395518 | |
| 2 | 2390305 | 20.5% |
| 3 | 990205 | 8.5% |
| 4 | 573608 | 4.9% |
| 5 | 305510 | 2.6% |
| 6 | 262681 | 2.2% |
| 7 | 143383 | 1.2% |
| 8 | 106698 | 0.9% |
| 9 | 73134 | 0.6% |
| 10 | 60832 | 0.5% |
| Value | Count | Frequency (%) |
| 1231 | 1 | |
| 1176 | 1 | |
| 1016 | 1 | |
| 1010 | 1 | |
| 807 | 1 | |
| 721 | 2 | |
| 720 | 2 | |
| 708 | 1 | |
| 697 | 1 | |
| 686 | 1 |
| Distinct | 9828 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 35.77872238 |
| Minimum | 0 |
|---|---|
| Maximum | 26061.14 |
| Zeros | 32 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 178.3 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 4.95 |
| Q1 | 10.99 |
| median | 18.99 |
| Q3 | 35.96 |
| 95-th percentile | 109.89 |
| Maximum | 26061.14 |
| Range | 26061.14 |
| Interquartile range (IQR) | 24.97 |
Descriptive statistics
| Standard deviation | 89.06332056 |
|---|---|
| Coefficient of variation (CV) | 2.489281747 |
| Kurtosis | 4128.714697 |
| Mean | 35.77872238 |
| Median Absolute Deviation (MAD) | 9 |
| Skewness | 39.84989273 |
| Sum | 418041025.3 |
| Variance | 7932.275069 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 9.99 | 562822 | 4.8% |
| 12.99 | 403091 | 3.4% |
| 10.99 | 377270 | 3.2% |
| 19.99 | 303203 | 2.6% |
| 11.99 | 293380 | 2.5% |
| 14.99 | 283563 | 2.4% |
| 8.99 | 266572 | 2.3% |
| 13.99 | 263456 | 2.3% |
| 16.99 | 244674 | 2.1% |
| 17.99 | 242775 | 2.1% |
| Other values (9818) | 8443262 |
| Value | Count | Frequency (%) |
| 0 | 32 | < 0.1% |
| 0.49 | 2 | < 0.1% |
| 0.99 | 90946 | |
| 1.29 | 4021 | < 0.1% |
| 1.49 | 9795 | 0.1% |
| 1.79 | 1761 | < 0.1% |
| 1.98 | 80510 | |
| 1.99 | 60079 | |
| 2.29 | 7932 | 0.1% |
| 2.49 | 3809 | < 0.1% |
| Value | Count | Frequency (%) |
| 26061.14 | 1 | |
| 22994.03 | 1 | |
| 18805.05 | 1 | |
| 18552.93 | 1 | |
| 17697.05 | 1 | |
| 15599.88 | 1 | |
| 15276.24 | 1 | |
| 14487.24 | 1 | |
| 13335.53 | 1 | |
| 13279.97 | 1 |
| Distinct | 392 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.83307543 |
| Minimum | 0 |
|---|---|
| Maximum | 5799.99 |
| Zeros | 32 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 178.3 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1.99 |
| Q1 | 8.99 |
| median | 12.99 |
| Q3 | 19.99 |
| 95-th percentile | 37.99 |
| Maximum | 5799.99 |
| Range | 5799.99 |
| Interquartile range (IQR) | 11 |
Descriptive statistics
| Standard deviation | 14.5346574 |
|---|---|
| Coefficient of variation (CV) | 0.917993315 |
| Kurtosis | 10583.20284 |
| Mean | 15.83307543 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | 42.4717086 |
| Sum | 184994729.9 |
| Variance | 211.2562657 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 9.99 | 1047663 | 9.0% |
| 12.99 | 669786 | 5.7% |
| 10.99 | 645709 | 5.5% |
| 8.99 | 520170 | 4.5% |
| 11.99 | 494928 | 4.2% |
| 0.99 | 487701 | 4.2% |
| 19.99 | 477498 | 4.1% |
| 14.99 | 466267 | 4.0% |
| 7.99 | 439818 | 3.8% |
| 13.99 | 430703 | 3.7% |
| Other values (382) | 6003825 |
| Value | Count | Frequency (%) |
| 0 | 32 | < 0.1% |
| 0.49 | 5 | < 0.1% |
| 0.99 | 487701 | |
| 1.29 | 21918 | 0.2% |
| 1.49 | 58134 | 0.5% |
| 1.79 | 5497 | < 0.1% |
| 1.99 | 242915 | |
| 2.29 | 25946 | 0.2% |
| 2.49 | 10033 | 0.1% |
| 2.79 | 9171 | 0.1% |
| Value | Count | Frequency (%) |
| 5799.99 | 1 | < 0.1% |
| 4999.99 | 2 | < 0.1% |
| 4696.99 | 3 | < 0.1% |
| 3499.99 | 1 | < 0.1% |
| 3399.99 | 1 | < 0.1% |
| 3049.99 | 10 | |
| 2999.99 | 3 | < 0.1% |
| 2699.99 | 2 | < 0.1% |
| 2199.99 | 3 | < 0.1% |
| 1999.99 | 3 | < 0.1% |
SalesDate
Date
| Distinct | 364 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 178.3 MiB |
| Minimum | 2016-01-01 00:00:00 |
|---|---|
| Maximum | 2016-12-31 00:00:00 |
Histogram with fixed size bins (bins=50)
| Distinct | 22 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 963.475969 |
| Minimum | 50 |
|---|---|
| Maximum | 20000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 178.3 MiB |
Quantile statistics
| Minimum | 50 |
|---|---|
| 5-th percentile | 50 |
| Q1 | 750 |
| median | 750 |
| Q3 | 1500 |
| 95-th percentile | 1750 |
| Maximum | 20000 |
| Range | 19950 |
| Interquartile range (IQR) | 750 |
Descriptive statistics
| Standard deviation | 707.1605581 |
|---|---|
| Coefficient of variation (CV) | 0.7339680292 |
| Kurtosis | 13.68178308 |
| Mean | 963.475969 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.622212629 |
| Sum | 1.125731874 × 1010 |
| Variance | 500076.055 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=22)
| Value | Count | Frequency (%) |
| 750 | 6663894 | |
| 1750 | 1976333 | 16.9% |
| 50 | 1053037 | 9.0% |
| 1500 | 734451 | 6.3% |
| 375 | 595987 | 5.1% |
| 1000 | 207512 | 1.8% |
| 3000 | 153229 | 1.3% |
| 5000 | 130108 | 1.1% |
| 187 | 54959 | 0.5% |
| 100 | 29415 | 0.3% |
| Other values (12) | 85143 | 0.7% |
| Value | Count | Frequency (%) |
| 50 | 1053037 | |
| 100 | 29415 | 0.3% |
| 150 | 2499 | < 0.1% |
| 180 | 599 | < 0.1% |
| 187 | 54959 | 0.5% |
| 200 | 23227 | 0.2% |
| 250 | 4584 | < 0.1% |
| 300 | 6538 | 0.1% |
| 330 | 1315 | < 0.1% |
| 375 | 595987 |
| Value | Count | Frequency (%) |
| 20000 | 1 | < 0.1% |
| 18000 | 99 | < 0.1% |
| 5000 | 130108 | 1.1% |
| 4000 | 16579 | 0.1% |
| 3000 | 153229 | 1.3% |
| 1750 | 1976333 | 16.9% |
| 1500 | 734451 | 6.3% |
| 1000 | 207512 | 1.8% |
| 750 | 6663894 | |
| 720 | 1788 | < 0.1% |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 735.4 MiB |
| 1 | |
|---|---|
| 2 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 11684068 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 6939639 | |
| 2 | 4744429 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 1 | 6939639 | |
| 2 | 4744429 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 6939639 | |
| 2 | 4744429 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 11684068 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 6939639 | |
| 2 | 4744429 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 11684068 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 6939639 | |
| 2 | 4744429 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 11684068 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 6939639 | |
| 2 | 4744429 |
| Distinct | 1132 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.535389381 |
| Minimum | 0.01 |
|---|---|
| Maximum | 1260.52 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 178.3 MiB |
Quantile statistics
| Minimum | 0.01 |
|---|---|
| 5-th percentile | 0.1 |
| Q1 | 0.22 |
| median | 0.79 |
| Q3 | 1.57 |
| 95-th percentile | 5.51 |
| Maximum | 1260.52 |
| Range | 1260.51 |
| Interquartile range (IQR) | 1.35 |
Descriptive statistics
| Standard deviation | 4.706324094 |
|---|---|
| Coefficient of variation (CV) | 3.06523163 |
| Kurtosis | 2355.142099 |
| Mean | 1.535389381 |
| Median Absolute Deviation (MAD) | 0.68 |
| Skewness | 29.34954166 |
| Sum | 17939593.93 |
| Variance | 22.14948648 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.79 | 2210129 | |
| 0.11 | 1937494 | |
| 0.22 | 1145104 | 9.8% |
| 1.84 | 1020893 | 8.7% |
| 1.57 | 623400 | 5.3% |
| 0.45 | 440911 | 3.8% |
| 3.67 | 393659 | 3.4% |
| 0.39 | 352851 | 3.0% |
| 0.34 | 311980 | 2.7% |
| 0.05 | 263297 | 2.3% |
| Other values (1122) | 2984350 |
| Value | Count | Frequency (%) |
| 0.01 | 672 | < 0.1% |
| 0.02 | 711 | < 0.1% |
| 0.03 | 29328 | 0.3% |
| 0.04 | 4163 | < 0.1% |
| 0.05 | 263297 | |
| 0.06 | 41293 | 0.4% |
| 0.07 | 36 | < 0.1% |
| 0.08 | 19927 | 0.2% |
| 0.09 | 896 | < 0.1% |
| 0.1 | 229023 |
| Value | Count | Frequency (%) |
| 1260.52 | 1 | |
| 909.56 | 1 | |
| 790.12 | 1 | |
| 731.85 | 1 | |
| 731.32 | 1 | |
| 687.22 | 1 | |
| 676.2 | 1 | |
| 670.69 | 1 | |
| 663.34 | 1 | |
| 661.5 | 1 |
VendorNo
Real number (ℝ≥0)
| Distinct | 117 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7043.213783 |
| Minimum | 2 |
|---|---|
| Maximum | 173357 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 178.3 MiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 516 |
| Q1 | 3252 |
| median | 4425 |
| Q3 | 9552 |
| 95-th percentile | 17035 |
| Maximum | 173357 |
| Range | 173355 |
| Interquartile range (IQR) | 6300 |
Descriptive statistics
| Standard deviation | 8419.868134 |
|---|---|
| Coefficient of variation (CV) | 1.195458266 |
| Kurtosis | 71.01064229 |
| Mean | 7043.213783 |
| Median Absolute Deviation (MAD) | 3297 |
| Skewness | 7.248125247 |
| Sum | 8.229338878 × 1010 |
| Variance | 70894179.4 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3960 | 1393773 | 11.9% |
| 12546 | 1036396 | 8.9% |
| 1392 | 826097 | 7.1% |
| 4425 | 821478 | 7.0% |
| 3252 | 696401 | 6.0% |
| 17035 | 588848 | 5.0% |
| 480 | 514597 | 4.4% |
| 9552 | 504700 | 4.3% |
| 8004 | 455740 | 3.9% |
| 9165 | 411105 | 3.5% |
| Other values (107) | 4434933 |
| Value | Count | Frequency (%) |
| 2 | 6 | < 0.1% |
| 60 | 32 | < 0.1% |
| 105 | 255 | < 0.1% |
| 200 | 3 | < 0.1% |
| 287 | 90 | < 0.1% |
| 388 | 1262 | < 0.1% |
| 480 | 514597 | |
| 516 | 104065 | 0.9% |
| 653 | 57755 | 0.5% |
| 660 | 203919 | 1.7% |
| Value | Count | Frequency (%) |
| 173357 | 296 | < 0.1% |
| 98450 | 6813 | |
| 90058 | 7094 | |
| 90057 | 506 | < 0.1% |
| 90056 | 9824 | |
| 90053 | 1592 | < 0.1% |
| 90052 | 395 | < 0.1% |
| 90051 | 568 | < 0.1% |
| 90047 | 9175 | |
| 90046 | 1838 | < 0.1% |
| Distinct | 118 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1020.1 MiB |
| DIAGEO NORTH AMERICA INC | |
|---|---|
| JIM BEAM BRANDS COMPANY | |
| CONSTELLATION BRANDS INC | |
| MARTIGNETTI COMPANIES | |
| E & J GALLO WINERY | |
| Other values (113) |
Length
| Max length | 39 |
|---|---|
| Median length | 27 |
| Mean length | 26.55049337 |
| Min length | 10 |
Characters and Unicode
| Total characters | 310217770 |
|---|---|
| Distinct characters | 46 |
| Distinct categories | 7 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 4 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | JIM BEAM BRANDS COMPANY |
|---|---|
| 2nd row | JIM BEAM BRANDS COMPANY |
| 3rd row | JIM BEAM BRANDS COMPANY |
| 4th row | JIM BEAM BRANDS COMPANY |
| 5th row | JIM BEAM BRANDS COMPANY |
Common Values
| Value | Count | Frequency (%) |
| DIAGEO NORTH AMERICA INC | 1393773 | 11.9% |
| JIM BEAM BRANDS COMPANY | 1036396 | 8.9% |
| CONSTELLATION BRANDS INC | 826097 | 7.1% |
| MARTIGNETTI COMPANIES | 819715 | 7.0% |
| E & J GALLO WINERY | 696401 | 6.0% |
| PERNOD RICARD USA | 588848 | 5.0% |
| BACARDI USA INC | 514597 | 4.4% |
| M S WALKER INC | 504700 | 4.3% |
| SAZERAC CO INC | 455740 | 3.9% |
| ULTRA BEVERAGE COMPANY LLP | 411105 | 3.5% |
| Other values (108) | 4436696 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| inc | 5382025 | 13.5% |
| brands | 1983044 | 5.0% |
| america | 1772741 | 4.4% |
| north | 1597692 | 4.0% |
| company | 1470434 | 3.7% |
| diageo | 1465801 | 3.7% |
| usa | 1371131 | 3.4% |
| 1112919 | 2.8% | |
| jim | 1036396 | 2.6% |
| beam | 1036396 | 2.6% |
| Other values (207) | 21636577 |
Most occurring characters
| Value | Count | Frequency (%) |
| 101087414 | ||
| A | 23051925 | 7.4% |
| I | 21244386 | 6.8% |
| N | 20353497 | 6.6% |
| E | 17957587 | 5.8% |
| R | 16670846 | 5.4% |
| C | 15077074 | 4.9% |
| O | 13207984 | 4.3% |
| S | 11339871 | 3.7% |
| T | 11124336 | 3.6% |
| Other values (36) | 59102850 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 206625266 | |
| Space Separator | 101087414 | |
| Other Punctuation | 1790317 | 0.6% |
| Dash Punctuation | 341366 | 0.1% |
| Lowercase Letter | 181029 | 0.1% |
| Open Punctuation | 96189 | < 0.1% |
| Close Punctuation | 96189 | < 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 23051925 | |
| I | 21244386 | |
| N | 20353497 | |
| E | 17957587 | 8.7% |
| R | 16670846 | 8.1% |
| C | 15077074 | 7.3% |
| O | 13207984 | 6.4% |
| S | 11339871 | 5.5% |
| T | 11124336 | 5.4% |
| M | 9503790 | 4.6% |
| Other values (16) | 47093970 |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 38634 | |
| s | 26623 | |
| r | 20811 | |
| d | 18597 | |
| e | 14612 | 8.1% |
| n | 14359 | 7.9% |
| l | 13626 | 7.5% |
| i | 7010 | 3.9% |
| u | 6586 | 3.6% |
| o | 6586 | 3.6% |
| Other values (3) | 13585 | 7.5% |
Other Punctuation
| Value | Count | Frequency (%) |
| & | 1112919 | |
| . | 575615 | |
| , | 101783 | 5.7% |
Space Separator
| Value | Count | Frequency (%) |
| 101087414 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 341366 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 96189 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 96189 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 206806295 | |
| Common | 103411475 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| A | 23051925 | |
| I | 21244386 | |
| N | 20353497 | |
| E | 17957587 | 8.7% |
| R | 16670846 | 8.1% |
| C | 15077074 | 7.3% |
| O | 13207984 | 6.4% |
| S | 11339871 | 5.5% |
| T | 11124336 | 5.4% |
| M | 9503790 | 4.6% |
| Other values (29) | 47274999 |
Common
| Value | Count | Frequency (%) |
| 101087414 | ||
| & | 1112919 | 1.1% |
| . | 575615 | 0.6% |
| - | 341366 | 0.3% |
| , | 101783 | 0.1% |
| ( | 96189 | 0.1% |
| ) | 96189 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 310217770 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 101087414 | ||
| A | 23051925 | 7.4% |
| I | 21244386 | 6.8% |
| N | 20353497 | 6.6% |
| E | 17957587 | 5.8% |
| R | 16670846 | 5.4% |
| C | 15077074 | 4.9% |
| O | 13207984 | 4.3% |
| S | 11339871 | 3.7% |
| T | 11124336 | 3.6% |
| Other values (36) | 59102850 |
| Distinct | 67 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 822.3 MiB |
| MOUNTMEND | |
|---|---|
| DONCASTER | |
| EANVERNESS | |
| GOULCREST | 555501 |
| HORNSEY | 521485 |
| Other values (62) |
Length
| Max length | 13 |
|---|---|
| Median length | 12 |
| Mean length | 8.796688619 |
| Min length | 4 |
Characters and Unicode
| Total characters | 102781108 |
|---|---|
| Distinct characters | 24 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | HARDERSFIELD |
|---|---|
| 2nd row | HARDERSFIELD |
| 3rd row | HARDERSFIELD |
| 4th row | HARDERSFIELD |
| 5th row | HARDERSFIELD |
Common Values
| Value | Count | Frequency (%) |
| MOUNTMEND | 884318 | 7.6% |
| DONCASTER | 846760 | 7.2% |
| EANVERNESS | 833123 | 7.1% |
| GOULCREST | 555501 | 4.8% |
| HORNSEY | 521485 | 4.5% |
| PITMERDEN | 381510 | 3.3% |
| HARDERSFIELD | 360347 | 3.1% |
| IRRAGIN | 267212 | 2.3% |
| LARNWICK | 263285 | 2.3% |
| WANBORNE | 244581 | 2.1% |
| Other values (57) | 6525946 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| mountmend | 884318 | 7.2% |
| doncaster | 846760 | 6.9% |
| eanverness | 833123 | 6.8% |
| goulcrest | 555501 | 4.5% |
| hornsey | 521485 | 4.3% |
| pitmerden | 381510 | 3.1% |
| hardersfield | 360347 | 2.9% |
| irragin | 267212 | 2.2% |
| larnwick | 263285 | 2.1% |
| wanborne | 244581 | 2.0% |
| Other values (62) | 7106595 |
Most occurring characters
| Value | Count | Frequency (%) |
| E | 12492906 | |
| R | 10407731 | 10.1% |
| N | 9303426 | 9.1% |
| A | 8837376 | 8.6% |
| O | 6906316 | 6.7% |
| L | 6549263 | 6.4% |
| S | 6497537 | 6.3% |
| T | 5722509 | 5.6% |
| D | 4414554 | 4.3% |
| C | 4182380 | 4.1% |
| Other values (14) | 27467110 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 101943978 | |
| Space Separator | 580649 | 0.6% |
| Other Punctuation | 256481 | 0.2% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| E | 12492906 | |
| R | 10407731 | 10.2% |
| N | 9303426 | 9.1% |
| A | 8837376 | 8.7% |
| O | 6906316 | 6.8% |
| L | 6549263 | 6.4% |
| S | 6497537 | 6.4% |
| T | 5722509 | 5.6% |
| D | 4414554 | 4.3% |
| C | 4182380 | 4.1% |
| Other values (12) | 26629980 |
Space Separator
| Value | Count | Frequency (%) |
| 580649 |
Other Punctuation
| Value | Count | Frequency (%) |
| ' | 256481 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 101943978 | |
| Common | 837130 | 0.8% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| E | 12492906 | |
| R | 10407731 | 10.2% |
| N | 9303426 | 9.1% |
| A | 8837376 | 8.7% |
| O | 6906316 | 6.8% |
| L | 6549263 | 6.4% |
| S | 6497537 | 6.4% |
| T | 5722509 | 5.6% |
| D | 4414554 | 4.3% |
| C | 4182380 | 4.1% |
| Other values (12) | 26629980 |
Common
| Value | Count | Frequency (%) |
| 580649 | ||
| ' | 256481 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 102781108 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| E | 12492906 | |
| R | 10407731 | 10.1% |
| N | 9303426 | 9.1% |
| A | 8837376 | 8.6% |
| O | 6906316 | 6.7% |
| L | 6549263 | 6.4% |
| S | 6497537 | 6.3% |
| T | 5722509 | 5.6% |
| D | 4414554 | 4.3% |
| C | 4182380 | 4.1% |
| Other values (14) | 27467110 |
SalesDateMonth
Real number (ℝ≥0)
| Distinct | 12 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.672609232 |
| Minimum | 1 |
|---|---|
| Maximum | 12 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 178.3 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 4 |
| median | 7 |
| Q3 | 10 |
| 95-th percentile | 12 |
| Maximum | 12 |
| Range | 11 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.4488207 |
|---|---|
| Coefficient of variation (CV) | 0.5168623818 |
| Kurtosis | -1.176147685 |
| Mean | 6.672609232 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | -0.05055144269 |
| Sum | 77963220 |
| Variance | 11.89436422 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=12)
| Value | Count | Frequency (%) |
| 12 | 1169178 | |
| 7 | 1102646 | |
| 8 | 1014932 | |
| 6 | 986161 | |
| 5 | 969355 | |
| 9 | 966188 | |
| 10 | 941689 | |
| 11 | 932305 | |
| 1 | 931294 | |
| 4 | 900484 | |
| Other values (2) | 1769836 |
| Value | Count | Frequency (%) |
| 1 | 931294 | |
| 2 | 879310 | |
| 3 | 890526 | |
| 4 | 900484 | |
| 5 | 969355 | |
| 6 | 986161 | |
| 7 | 1102646 | |
| 8 | 1014932 | |
| 9 | 966188 | |
| 10 | 941689 |
| Value | Count | Frequency (%) |
| 12 | 1169178 | |
| 11 | 932305 | |
| 10 | 941689 | |
| 9 | 966188 | |
| 8 | 1014932 | |
| 7 | 1102646 | |
| 6 | 986161 | |
| 5 | 969355 | |
| 4 | 900484 | |
| 3 | 890526 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| InventoryId | Store | Brand | Description | Size | SalesQuantity | SalesDollars | SalesPrice | SalesDate | Volume | Classification | ExciseTax | VendorNo | VendorName | City | SalesDateMonth | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1_HARDERSFIELD_1004 | 1 | 1004 | Jim Beam w/2 Rocks Glasses | 750mL | 1 | 16.49 | 16.49 | 2016-01-01 | 750.0 | 1 | 0.79 | 12546 | JIM BEAM BRANDS COMPANY | HARDERSFIELD | 1 |
| 1 | 1_HARDERSFIELD_1004 | 1 | 1004 | Jim Beam w/2 Rocks Glasses | 750mL | 2 | 32.98 | 16.49 | 2016-01-02 | 750.0 | 1 | 1.57 | 12546 | JIM BEAM BRANDS COMPANY | HARDERSFIELD | 1 |
| 2 | 1_HARDERSFIELD_1004 | 1 | 1004 | Jim Beam w/2 Rocks Glasses | 750mL | 1 | 16.49 | 16.49 | 2016-01-03 | 750.0 | 1 | 0.79 | 12546 | JIM BEAM BRANDS COMPANY | HARDERSFIELD | 1 |
| 3 | 1_HARDERSFIELD_1004 | 1 | 1004 | Jim Beam w/2 Rocks Glasses | 750mL | 1 | 14.49 | 14.49 | 2016-01-08 | 750.0 | 1 | 0.79 | 12546 | JIM BEAM BRANDS COMPANY | HARDERSFIELD | 1 |
| 4 | 1_HARDERSFIELD_1004 | 1 | 1004 | Jim Beam w/2 Rocks Glasses | 750mL | 2 | 28.98 | 14.49 | 2016-02-09 | 750.0 | 1 | 1.57 | 12546 | JIM BEAM BRANDS COMPANY | HARDERSFIELD | 2 |
| 5 | 1_HARDERSFIELD_1004 | 1 | 1004 | Jim Beam w/2 Rocks Glasses | 750mL | 3 | 43.47 | 14.49 | 2016-02-10 | 750.0 | 1 | 2.36 | 12546 | JIM BEAM BRANDS COMPANY | HARDERSFIELD | 2 |
| 6 | 1_HARDERSFIELD_1004 | 1 | 1004 | Jim Beam w/2 Rocks Glasses | 750mL | 1 | 14.49 | 14.49 | 2016-02-11 | 750.0 | 1 | 0.79 | 12546 | JIM BEAM BRANDS COMPANY | HARDERSFIELD | 2 |
| 7 | 1_HARDERSFIELD_1004 | 1 | 1004 | Jim Beam w/2 Rocks Glasses | 750mL | 1 | 14.49 | 14.49 | 2016-02-16 | 750.0 | 1 | 0.79 | 12546 | JIM BEAM BRANDS COMPANY | HARDERSFIELD | 2 |
| 8 | 1_HARDERSFIELD_1004 | 1 | 1004 | Jim Beam w/2 Rocks Glasses | 750mL | 1 | 14.49 | 14.49 | 2016-02-17 | 750.0 | 1 | 0.79 | 12546 | JIM BEAM BRANDS COMPANY | HARDERSFIELD | 2 |
| 9 | 1_HARDERSFIELD_1004 | 1 | 1004 | Jim Beam w/2 Rocks Glasses | 750mL | 1 | 14.49 | 14.49 | 2016-02-20 | 750.0 | 1 | 0.79 | 12546 | JIM BEAM BRANDS COMPANY | HARDERSFIELD | 2 |
Last rows
| InventoryId | Store | Brand | Description | Size | SalesQuantity | SalesDollars | SalesPrice | SalesDate | Volume | Classification | ExciseTax | VendorNo | VendorName | City | SalesDateMonth | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11684058 | 7_STANMORE_2250 | 7 | 2250 | Jack Daniels Family 4 Pk/50m | 50mL 4 Pk | 2 | 29.98 | 14.99 | 2016-12-16 | 50.0 | 1 | 0.10 | 1128 | BROWN-FORMAN CORP | STANMORE | 12 |
| 11684059 | 7_STANMORE_2250 | 7 | 2250 | Jack Daniels Family 4 Pk/50m | 50mL 4 Pk | 4 | 59.96 | 14.99 | 2016-12-17 | 50.0 | 1 | 0.21 | 1128 | BROWN-FORMAN CORP | STANMORE | 12 |
| 11684060 | 7_STANMORE_2250 | 7 | 2250 | Jack Daniels Family 4 Pk/50m | 50mL 4 Pk | 1 | 14.99 | 14.99 | 2016-12-19 | 50.0 | 1 | 0.05 | 1128 | BROWN-FORMAN CORP | STANMORE | 12 |
| 11684061 | 7_STANMORE_2250 | 7 | 2250 | Jack Daniels Family 4 Pk/50m | 50mL 4 Pk | 2 | 29.98 | 14.99 | 2016-12-20 | 50.0 | 1 | 0.10 | 1128 | BROWN-FORMAN CORP | STANMORE | 12 |
| 11684062 | 7_STANMORE_2250 | 7 | 2250 | Jack Daniels Family 4 Pk/50m | 50mL 4 Pk | 3 | 44.97 | 14.99 | 2016-12-21 | 50.0 | 1 | 0.16 | 1128 | BROWN-FORMAN CORP | STANMORE | 12 |
| 11684063 | 7_STANMORE_2250 | 7 | 2250 | Jack Daniels Family 4 Pk/50m | 50mL 4 Pk | 2 | 29.98 | 14.99 | 2016-12-22 | 50.0 | 1 | 0.10 | 1128 | BROWN-FORMAN CORP | STANMORE | 12 |
| 11684064 | 8_ALNERWICK_36138 | 8 | 36138 | Renzo Masi Chianti Rufina | 750mL | 2 | 21.98 | 10.99 | 2016-12-27 | 750.0 | 2 | 0.22 | 10754 | PERFECTA WINES | ALNERWICK | 12 |
| 11684065 | 8_ALNERWICK_652 | 8 | 652 | Frangelico Gift Pack-Candle | 750mL | 1 | 21.99 | 21.99 | 2016-12-21 | 750.0 | 1 | 0.79 | 11567 | CAMPARI AMERICA | ALNERWICK | 12 |
| 11684066 | 9_BLACKPOOL_3204 | 9 | 3204 | Twenty 2 Vodka | 750mL | 1 | 21.99 | 21.99 | 2016-12-01 | 750.0 | 1 | 0.79 | 7153 | PINE STATE TRADING CO | BLACKPOOL | 12 |
| 11684067 | 9_BLACKPOOL_3204 | 9 | 3204 | Twenty 2 Vodka | 750mL | 1 | 21.99 | 21.99 | 2016-12-04 | 750.0 | 1 | 0.79 | 7153 | PINE STATE TRADING CO | BLACKPOOL | 12 |